Succinct colored de Bruijn graphs

نویسندگان

  • Martin D. Muggli
  • Alexander Bowe
  • Noelle R. Noyes
  • Paul S. Morley
  • Keith E. Belk
  • Robert Raymond
  • Travis Gagie
  • Simon J. Puglisi
  • Christina Boucher
چکیده

Motivation In 2012, Iqbal et al. introduced the colored de Bruijn graph, a variant of the classic de Bruijn graph, which is aimed at 'detecting and genotyping simple and complex genetic variants in an individual or population'. Because they are intended to be applied to massive population level data, it is essential that the graphs be represented efficiently. Unfortunately, current succinct de Bruijn graph representations are not directly applicable to the colored de Bruijn graph, which requires additional information to be succinctly encoded as well as support for non-standard traversal operations. Results Our data structure dramatically reduces the amount of memory required to store and use the colored de Bruijn graph, with some penalty to runtime, allowing it to be applied in much larger and more ambitious sequence projects than was previously possible. Availability and Implementation https://github.com/cosmo-team/cosmo/tree/VARI. Contact [email protected]. Supplementary information Supplementary data are available at Bioinformatics online.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Rainbowfish: A Succinct Colored de Bruijn Graph Representation

The colored de Bruijn graph – a variant of the de Bruijn graph which associates each edge (i.e., k-mer) with some set of colors – is an increasingly important combinatorial structure in computational biology. Iqbal et al. demonstrated the utility of this structure for representing and assembling a collection (population) of genomes, and showed how it can be used to accurately detect genetic var...

متن کامل

Disentangled Long-Read De Bruijn Graphs via Optical Maps

While long reads produced by third-generation sequencing technology from, e.g, Pacific Biosciences have been shown to increase the quality of draft genomes in repetitive regions, fundamental computational challenges remain in overcoming their high error rate and assembling them efficiently. In this paper we show that the de Bruijn graph built on the long reads can be efficiently and substantial...

متن کامل

On k-colored Lambda Terms and Their Skeletons

The paper describes an application of logic programming to the modeling of difficult combinatorial properties of lambda terms, with focus on the class of simply typed terms. Lambda terms in de Bruijn notation are Motzkin trees (also called binary-unary trees) with indices at their leaves counting up on the path to the root the steps to their lambda binder. As a generalization of affine lambda t...

متن کامل

The Collatz conjecture and De Bruijn graphs

We study variants of the well-known Collatz graph, by considering the action of the 3n+ 1 function on congruence classes. For moduli equal to powers of 2, these graphs are shown to be isomorphic to binary De Bruijn graphs. Unlike the Collatz graph, these graphs are very structured, and have several interesting properties. We then look at a natural generalization of these finite graphs to the 2-...

متن کامل

On the recognition of de Bruijn graphs and their induced subgraphs

The directed de Bruijn graphs appear often as models in computer science, because of the useful properties these graphs have. Similarly, the induced subgraphs of these graphs have applications related to the sequencing of DNA chains. In this paper, we show that the directed de Bruijn graphs can be recognized in polynomial time. We also show that it is possible to recognize in polynomial time wh...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • Bioinformatics

دوره 33 20  شماره 

صفحات  -

تاریخ انتشار 2017